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Understanding the patterns of mobility of individuals is crucial for a number of reasons, from city planning 
to disaster management. There are two common ways of quantifj^ing the amount of travel between locations: 
by direct observations that often involve privacy issues, e.g., tracking mobile phone locations, or by 
estimations from models. Typically, such models build on accurate knowledge of the population size at each 
location. However, when this information is not readily available, their applicability is rather limited. As 
mobile phones are ubiquitous, our aim is to investigate if mobility patterns can be inferred from aggregated 
mobile phone call data alone. Using data released by Orange for Ivory Coast, we show that human mobility is 
well predicted by a simple model based on the frequency of mobile phone calls between two locations and 
their geographical distance. We argue that the strength of the model comes from directly incorporating the 
social dimension of mobility. Furthermore, as only aggregated call data is required, the model helps to avoid 
potential privacy problems. 

People travel and move for a variety of reasons, including social, economic, and political factors. While 
individuals may follow simple, recurrent patterns of movement, e.g., daily commuting, a more complex 
picture emerges when all trajectories of a population are assembled together \ Understanding the principles 
governing individual and collective movement is important for a number of reasons: for planning urban design^, 
for forecasting and avoiding traffic congestion^ for mitigating infectious disease^"^, and for contingency planning 
in extreme situations caused by disasters^'^. However, accurately determining the movement patterns in a 
population is cumbersome and costly, and involves privacy issues. 

There are two ways of inferring the mobility patterns in a population: by direct measurement or by models that 
predict population movement based on other observed data. Regarding the former, tracking the movement of 
individuals using location data from mobile phones^" has emerged as a powerful alternative to traditional 
methods such as traffic surveys^^. In this case, the data set comes from the billing systems of mobile phone 
operators, where the closest tower of each phone is recorded when a mobile phone is used. The resolution 
problems caused by this are compensated by the large quantity and high quality of data^^'^^. However, there 
are drawbacks to this approach: tracking the locations of individuals maybe seen as a threat to privacy even when 
the data is properly anonymised^^. 

The alternative approach to direct measurement is to use models that predict the average population behaviour 
from (publicly) available information, such as census and population data. Perhaps the most famous example is 
the gravity modeP^"^^ that has been used to predict the intensity of a number of human interactions, including 
population movement^^"^^ and mobile phone calls between cities^^. In the gravity model, the intensity of inter- 
actions between two locations (e.g., cities) is determined by their populations and distance (with proper scaling 
exponents). Recently, it has been shown that a parameter- free model, the radiation modeP^, is able to predict 
mobility patterns with improved accuracy; this model requires geospatial information on population size as an 
input. 

The applicability of the above-mentioned models is constrained by the availability of accurate population 
information. This may become a problem e.g. for developing countries, where census data may be incomplete. 
However, mobile phones are ubiquitous almost everywhere, and one might expect that mobile phone calls reflect 
the social dimension of mobility - the amount of social ties between geospatial locations can be expected to 
influence travel patterns. Therefore, the aim of this paper is to predict mobility patterns from mobile phone call 
data alone, and examine models that would be applicable in a setting where accurate, up-to-date population 
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information is not available. Furthermore, we focus on models that 
only require aggregated call data, without needing to track individual 
users. This has the obvious benefit of mitigating privacy- related 
issues; additionally, the volume of required input data is smaller 
and the aggregation can be easily done by the mobile operator that 
owns the source data. 

Our modelling and analysis is purely based on the Ivory Coast 
mobile telephone data set^^, originally released by Orange for the 
Data for Development Challenge. This data set includes information 
on mobile phone calls aggregated at the tower level during 140 days, 
used as inputs for the models, and data on the trajectories of ran- 
domly chosen individuals, used for developing the models and test- 
ing their accuracy. There is no accurate, up-to-date geospatial 
population information for Ivory Coast; the last census was con- 
ducted in 1998, and there is no data available on mobility or migra- 
tion within the country. In contrast, the telephone system in Ivory 
Coast is well-developed by African standards with mobile phone 
penetration above 83%^^. 

This paper is constructed as follows: first, we examine gravity laws 
for average mobility and call frequency between locations. We then 
proceed to show that mobility between two locations can be directly 
estimated from the number of calls between the locations and their 
distance. This holds at two levels of coarse-graining: between tower 
locations in a major city and between cities. Finally, we study the 
accuracy of predictions for individual pairs of locations, beyond 
averages, and show that the number of calls between locations 
appears to be a good predictor of the frequency of travel between 
them. For reference, we also study variants of existing mobility mod- 
els (the gravity and radiation models) where location-specific call 
frequencies are used as inputs instead of population data; despite 
applying these models beyond their intended range, they provide 
fairly good predictions on average. 

Results 

Data set and coarse-graining. The data set comes in two parts: (i) 
the number of calls between 1231 Orange towers in Ivory Coast for 5 
months, and (ii) ten data sets on two -week individual trajectories of 
50,000 randomly chosen users. From the trajectories, we aggregated 
the mobility m^y between locations / and j by counting direct 
movements along the trajectories (see Methods for further details). 

As it is reasonable to assume that communication and mobility 
patterns are in general different for short and long distances, we 
aggregated the data at two levels: (i) tower level for intra-city beha- 
viour and (ii) city level for inter-city behaviour. The intra-city ana- 
lysis consist of 5.1 million movements and 109 million calls between 
all 298 towers located inside Abidjan, the largest city of Ivory Coast, 
during 140 days. This comprises 31% of all calls and 50% of all 
movements in the country. In this analysis the geographical unit - 
referred to as "location" in the following - is the area covered by a 
single tower. To analyse inter-city behaviour, we aggregated towers 
that lie within a city boundary and consider calls and mobility 
between cities. The resulting data contains 143 cities with 63 million 
calls and 374 thousand movements between them during 140 days. 
At both levels of analysis, we determine the number of calls, move- 
ments, and the geographical distance between every pair of locations 
(towers, cities). See Methods for further details. 

Gravity laws: dependence of mobility and communication inten- 
sity on distance. We begin by investigating whether the mobility and 
communication intensities between two locations follow the gravity 
law on average. In its general form, the gravity law states that 



where Xy is the intensity of interaction, e.g., calls, mobility, trade, 
between locations / and j associated with populations of sizes Ni 



and Np separated by a distance dfj^^'^^. The exponent a governs the 
distance dependence. Note that in the most general form of the 
gravity law, Nf and Nj are also associated with an exponent; here 
for simplicity we assume a linear dependence. For our data, we 
study the intensities of mobility rriij and communication 
between locations / and j. These are defined as the average number 
of weekly movements and calls between them, respectively. As a 
proxy of the population Ni, we take the total number of weekly 
calls Si made and received at location /. 

The variation of the scaled mobility intensity, niijlSiSp with respect 
to the distance dij is shown in Fig. 1 for the tower and city levels of 
coarse-graining (panels A and B, respectively). In both cases, the 
gravity law holds on average and 

where y ~ 2.14 for the intra-city level and y ~ 2.54 for the inter-city 
level. Panels C and D display a similar plot for the scaled commun- 
ication intensity that is also seen on average to follow the gravity law: 

where the distance exponents are ^ ~ 1.20 for the intra-city level and 
^ ~ 1.48 for the inter-city level. It is worth noting that both expo- 
nents y and d are smaller for the intra-city level, indicating differ- 
ences in communication and travel patterns within and between 
cities: within a city, the spatial distance appears to play a less import- 
ant role than it does between cities. 

The two gravity laws discussed above suggest that the following 
relationship might also hold: 

where P = y — 3. This is indeed the case, as seen in Fig. 1 (E,F) where 
(mij/Cij) follows a power-law dependence on dij. For both intra- and 
inter- city levels, we find the exponent j5 ~ y — ^ (see Table I). These 
results suggest that there are two possible ways of inferring the 
intensity of mobility between locations / and j from call data: using 
the distance and either (i) the total call numbers at both locations S/ 
and Sj (Eq. 2), or (ii) the total number of calls between the locations Cij 
(Eq. 4). The prediction accuracy of these two models will be assessed 
in in the section "Prediction accuracy" below. 

It is worth noting that both for intra- and inter-city levels, the 
exponent j5 ~ 1. This does not directly result from Eqs. (2) and 
(3). One possible argument for the observed value of j5 is as follows: 
the cost of a single trip, measured in e.g. time or money, between two 
towers/cities / and j can be assumed to depend linearly on their 
distance, dij. This means that the total cost of all movements between 
/ and j is proportional to niijdij. However, the cost of communication 
is independent of distance. If one further assumes that the total cost 
of movement is balanced by the total benefit brought by social ties, 
linearly reflected in Cy, we have niijdij ~ Cy and thus the value of 
exponent P = 1. In this interpretation, the communication exponent 
3 is directly related to a decrease in the number of social ties as 
function of distance, whereas y captures a combination of cost assoc- 
iated with travel and the decrease in the number of social ties. 

Models for estimating mobility based on call data. The results of 
the previous section indicate that on average, the mobility intensity 
niij between two locations / and j can be estimated using the gravity 
model 

m^ = k^f, (5) 

where /c^ is a normalization constant obtained by equating the 
total numbers of expected and observed movements, i.e.. 
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Figure 1 | Dependence of the intensities of interaction on distance. The number of (A,B) movements per strength product mi/SiSp (C,D) calls per 
strength product Cij/siSp and (E,F) movements per call my/Cy decrease with distance between i and; for both intra-city and inter-city analyses. Each grey dot 
indicates a pair of locations, and circles correspond to the average log-binned behaviour. Solid lines show the fitted power-law decaying behaviour. 



y^ .. rriij = ^ .. . This model takes the communication intensities 

Si and Sj at both locations as inputs in addition to the distance dy. As 
an alternative we propose the communication model 



ml- 



(6) 



based on the communication intensity C/y between the locations. The 
normalization constant /c^ is obtained as before. The values of the 
exponents y and P are taken from Table I. 

For comparison, we also study a modified version of the radiation 
modeF^, originally designed to predict mobility between locations / 
and j with the help of data on population density in the surrounding 
area. Again, we modify the model such that only call and distance 
data is required as input. To this end, we assume that the number of 
calls in a given location is an unbiased estimate of population density, 
similarly to the gravity model. Note that this assumption may not 
necessarily hold, since mobile phone penetration may correlate with 
socioeconomic factors. Further, we assume that the number of trips 
that begin (end) at location / (j) is proportional to S/ (sj). Then, the 
radiation model formula can be rewritten as 



SiSj 



{Si + Sij) {Si + Sj + Sij) {sj + Sji) {sj + Si + Sji) 



(7) 



Prediction accuracy. To assess the actual predictive power of the 
models beyond averages, we compare the actual mobility intensity 
mij, obtained from the trajectory data set, with the estimates given by 
the models for each specific pair of locations / and j. This comparison 
for the communication model, the gravity model, and the radiation 
model is shown in Fig. 2. The gray dots correspond to predicted 
versus actual mobility for each pair of locations, and the boxes 
(whiskers) correspond to the region between 25th and 75th (9th 
and 91st) percentiles. 

It is clear from the figure that all models give on average reasonable 
predictions. However, the gravity and radiation models display 
higher levels of variance between the predicted and actual mobility 
intensities. In particular, the prediction accuracy of the gravity model 
is relatively poor for the inter- city mobility, and the radiation model 
performs the worst for the intra-city mobility. The latter is not sur- 
prising, as the radiation model was originally not designed for pre- 
dicting short-range travel patterns within cities. Further, the original 
radiation model requires accurate geospatial population informa- 
tion, and simply equating population size within an area with the 
number of calls can be expected to give rise to errors. 

The level of observed variance implies that in addition to compar- 
ing averages, it is important to compare the expected and observed 
mobility between individual pairs of locations. As the first step, we 

determine the Spearman correlation coefficients r^'^'^ between m/y 
c,G,r 



Here Sy denotes the total number of calls made within a circle of and ' ' . Table II shows that the correlation is higher for the 



radius dij centred at /, excluding locations / and j, and k 
malization constant. 



communication model than for the gravity and radiation models 
for both levels of coarse-graining (intra-city, inter-city). In general. 
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Table I | Theestimated values of exponents y (Eq. 2), 3 (Eq. 3), and 
(Eq. 4) for the tower and city levels of coarse-graining. The 
values and their standard errors have been obtained by least 
square fitting to logarithmically binned data 



Level 



intra-city (tower level) 
inter-city (city level) 



2.14: 
2.54: 



0.05 1.20: 
0.05 1.48: 



0.04 0.98: 
0.05 1.08: 



0.02 
: 0.05 



in terms of the Spearman coefficient, predictions of all models are 
more accurate for intra-city mobility than for inter-city mobility. 

Finally, we consider the differences between the observed and 
predicted mobilities by measuring their relative deviations. For all 
the three models, we define the relative deviations between the 



observed m/y and predicted m?'^'^ as 



C,GM I 



(8) 



where dij takes values between —1 and 1. A deviation of = 0 
implies exact prediction by the model for the pair of locations / 
and j, whereas negative (positive) values indicate under- (over-) 
estimations. We only determine d^j for those pairs of of / and j for 
which m;,- 0. 



The probability distributions p^^^'^'^^ shown in Fig. 3 confirm 

the above finding that out of the studied three models for inferring 
mobility from call data, the communication model has the highest 

accuracy of prediction. The distribution p(^^^ is well centred 
around zero, whereas especially for inter- city mobility the distribu- 
tions p(^^^ and P(^^) show a bias towards under- estimation. In 

more detail, for intra-city mobility, the fractions of location pairs 
with deviations d e [ — 0.25, 0.25] are 13% for the radiation model, 
42% for the gravity model, and 51% for the communication model. 
For inter-city mobility, the corresponding fractions are 20%, 17% 
and 33%. Note that for the gravity model, in spite of the fact that 
the average (my/(SfSy)) follows a -dependence (Fig. 1A,B), there is 
still a significant amount of under- estimation. This indicates that 
there is a broad distribution of the values of {mij/{SiSj)) for a given 
distance, and the average value is not always a good estimator. 

Discussion and conclusion 

The goal of this paper has been to investigate simple models that 
predict the intensities of mobility between two locations on the basis 
of mobile phone call data and their geospatial distance. The motiva- 
tion behind this is to provide ways of predicting mobility in situations 
where accurate information of population size at each location is not 
available; furthermore, the focus is on aggregated call data, mitigating 
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Figure 2 | Comparison between observed and predicted human mobility. The expected mobility intensities (A,B) for the communication 
model, (C,D) for the gravity model, and (E,F) for the radiation model are plotted against the mobility intensities observed in data m^. The 
left panels (A,C,E) correspond to the intra-city analysis and right panels (B,D,F) correspond to inter-city analysis. The boxes provide the region between 
25th and 75th percentiles, and the whiskers correspond to 9th and 91st percentiles of logarithmically binned data. A box is colored green if for a given bin 
the line y = x lies between the 9th and the 91st percentiles of the expected distribution; otherwise it is colored red. 
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Table II | Spearman correlation coefficient between the observed 
and predicted mobility values for the three models. For both intra- 
city and inter-city analyses the communication model shov/s larger 
correlation values than gravity and radiation models. The signifi- 
cance of the difference in the correlation is indicated bythep-values 



Level 



p(rf>rf)p{rf>,f) 



intra-city (tower level) 
inter-city (city level) 



0.87 
0.74 



0.81 
0.67 



0.82 
0.67 



<io- 
<io- 



<io- 
<io- 



the need to track movement patterns of individual phone users. Our 
study is based on call and mobility data released by Orange for Ivory 
Coast; note that it would be important to veriiy the findings with data 
from other countries. 

We have tested three models that only take aggregated call data 
and geospatial information as inputs: the well-known gravity model, 
the communication model based on the number of calls between two 
locations, and a modified version of the radiation model. While all 
models on average capture the real mobility patterns derived from 
call data with location information, a more detailed analysis of the 
prediction accuracy at the level of individual locations reveals that 
the communication model is the most accurate out of the three tested 
models in this setting. 

Note that the gravity and radiation models were originally 
designed to use geospatial population information as input para- 
meters. Since our aim has been to study mobility models in a setting 
where such information is not available, we have simply taken the 
number of calls at a given location as a proxy of the population size. 
Therefore we do not claim that the communication model would 
outperform other models in a situation where they could be applied 
as their designers intended. Also note that our modeling target - the 
mobility pattern - is also derived from mobile phone records, and 
geospatial biases in mobile phone usage might influence the results. 
Hence, it would be useful to verify the accuracy of the communica- 
tion model for a case where there are alternative sources of mobility 
information. 

The likely reason why the communication model works well is that 
it directly incorporates geospatial information on social ties and 
human relationships. It has been observed earlier that individuals 
tend to travel to locations where they have social bonds^; further- 
more, once under way, it is reasonable to assume that people make 
calls back home. Because of this, the aggregated intensity of com- 



munication between two locations should contain information on 
the mobility patterns as well. Then, in the first approximation one 
might assume that the frequency of movement between two locations 
is directly proportional to the intensity of communication. Further, 
the simplest way to incorporate the fact that larger distances imply 
larger travel costs (in terms of time or money) is to assume that 
mobility is inversely proportional to distance. These two components 
directly yield the communication model: rriij oc Cij/dij. 

It is worth noting that in general, in gravity laws of human inter- 
action, the distance dependence is associated with some exponent a. 
This is also seen in our analysis of the gravity laws for mobility and 
communication intensity, where the exponents were seen to depend 
on the level of coarse-graining, i.e., intra-city or inter-city. However, 
for both levels, the inverse distance dependence of the communica- 
tion model is approximately linear, i.e., the exponent equals one. This 
suggests universality and calls for analysis of similar data sets from 
different countries. 

Methods 

Communication and mobility data. The data set^^ consists of 2.5 million call detail 
records of customers for a single provider (Orange) in Ivory Coast between December 
1st, 2011 and April 28th, 2012. The communication data used in this paper contains 
the number of calls as well as their aggregated duration between all pairs of 1231 
towers, i.e., mobile base stations. The geographical locations of the towers were also 
provided. The temporal resolution of the data set is one hour. 

The mobility sample consists of ten data sets of trajectories of individual users, each 
for 50,000 randomly chosen users. Each trajectory corresponds to the subscribers' call 
locations during a two-week period. The locations were recorded every time a call was 
made and correspond to the position of the tower that transmitted the call. The data 
sets represent consecutive two- week periods, beginning in December 5, 2011. 

Determining city boundaries. As the locations of the cell-towers were provided, we 
used reverse geocoding^^ to determine the city in which the tower is located. The mean 
longitude and latitude of all towers within a city defines the centre of the city. This 
location was used to calculate the inter-city distances. Out of the 1231 mobile phone 
towers, 686 are located within city boundaries (with 298 of them in the largest city, 
Abidjan). The total number of cities with at least a single tower is 143. 

Determining direct movements. Given the individual trajectories of users, a variety 
of methods have been developed to extract different aspects of human mobility^^. 
Here, we consider direct movements that correspond to any consecutive changes in 
the location of a user. Formally, direct movements are defined as follows: if the user 
made a call from location / at some time t and j is the location of the next call at t' > t, 
there is a direct movement from / to j if j 7^ i. By aggregating this information for all 
users we determine, the total number of direct movements between all pairs of 
locations. The locations can correspond either to towers (intra-city analysis) or to 
cities (inter-city analysis). Note that for inter-city analysis, only towers located within 
city boundaries are considered. Thus, all calls and direct movements to locations 
between cities are ignored. 



intra-city 



inter-city 



-Communication Model 
-Gravity Model 
-Radiation Model 




-0.5 0 0.5 
Relative deviation S 



-0.5 0 0.5 
Relative deviation S 



Figure 3 | Relative deviation between the observed and predicted mobility values for the three models. Distribution Py^ij' ' j of the relative deviations 
^C,GM (Eq^ 8) foj. (A) intra-city and (B) inter-city mobility. 
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Data filtering. Users may be located in areas covered by several towers. In this case, 
the calls made by users at the same location can be handled by different neighbouring 
towers. This phenomena of switching of mobile phone calls between towers is called 
handover and it may give rise to artefacts in mobility and communication. For 
instance, let us consider an immobile user located in the boundary area covered by two 
towers / and j. If one of the calls of this user was served by tower / and the subsequent 
call by tower j, the data will indicate movement of the user from tower / to tower j. 
Similarly, the number of calls between neighbouring towers might also get biased. To 
get rid of this artefact, we excluded all pairs of neighbouring towers from our analysis. 
As the towers are heterogeneously distributed (higher concentration in densely 
populated areas and lower concentration in rural zones), neighbouring towers were 
identified by a distance-independent approach. To do this, we first computed the 
Voronoi diagram around each tower. The towers having a common edge in their 
Voronoi cells are defined as the neighbouring towers. We also excluded the 
communication and mobility between the towers that are located within 1 meter from 
each other (e.g. two base stations serving a busy area). Further, only pairs of locations 
with more than one call per day (on average) were considered. 
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